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Abstract 



In this short prehminary note I apply the methodology of game- 
theoretic probability to calculating non-asymptotic confidence intervals 
for the coefficient of a simple first order scalar autoregressive model. The 
most distinctive feature of the proposed procedure is that with high proba- 
bility it produces confidence intervals that always cover the true parameter 
value when applied sequentially. 

1 Introduction 

Game-theoretic probability (see, e.g., fBl, with the basic idea going back to Yille 
[7]) provides a means of testing probabihstic models. In this note the game- 
theoretic methodology is extended to statistical models; it will be demonstrated 
on the first-order scalar autoregressive model 



without the intercept term, with constant yo, and with independent iV(0, 1) 
innovations et. 

We will be interested in procedures for computing, for each t — 1,2,..., a 
confidence interval [^t,Mt] for a given yo,.. . ,yt. Let us fix a confidence level 
1 — (5, and let a be the true parameter value. The usual procedures are "batch" , 
in that they only guarantee that a € [?t,ut] with high probability for a fixed 
t. It is usually true that, when they are applied sequentially, the intersection 
(~^t^i[h, Ut] is empty with probability one. Our goal is to guarantee that 



with probability at least I — S. 

Analogously to the usual classification of the limit theorems of probability 
theory into "strong" (involving the conjunction over all t) and "weak" (appli- 
cable to individual t), let us call such confidence intervals strong. In particular, 
confidence intervals satisfying ^ with probability at least 1 — S will be called 



yt^ayt-i+et, t=l,2,..., 



(1) 



a e nZlik, Ut] 



(2) 



strong (1 — S) -confidence intervals. Accordingly, confidence intervals produced 
by the standard procedures will be referred to as weak; weak (1 — (5)-confidence 
intervals satisfy a G [lt,ut] with probability at least 1 — S for each individual t. 
(This probability is sometimes required to be precisely I ~ S, but we will only 
consider the "conservative" definitions.) 

To achieve the goal for each possible value of the parameter a we con- 
struct random variables S", t = 0, 1, . . ., that form a nonnegative martingale 
under the probability measure Pq, corresponding to the probabilistic model ([1]) 
with the given a. It will also be true that = 1; such sequences (nonnegative 
martingales starting from 1) will be called martingale tests. We can then set 

[lt,ut] = {a:S^<l/S} 

(assuming that the set on the right-hand side is an interval, which it will be in 
our case). The special case 



sup >l/6'> <5 

t=Q,l,... ) 

(due to Ville; see, e.g., [7], p. 100, or [6 , (2.12)) of Doob's inequality shows that 
^ will indeed be true with probability at least 1 — 6. 



2 Derivation of strong confidence intervals 

If the true probability density of yt (conditional on the past) is 

exp 7, 



V2tt 

and we want to reject the hypothesis 



1 / [yt-ayt-if 
^ exp 



V27r 

the best, in many respect^, martingale test is the likelihood ratio sequence with 
the relative increments 

_exp -V _ L 



'{a" - (a'™^)2)y2_^ + 2(a*™° - a)yt-iyt 

exp 



^Cf., e.g., the nonnegativity of the KuUback— Leibler divergence, Neyman-Pearson lemma, 
and the optimality property of the probabiUty ratio test in sequential analysis. 
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The product over i = 1, . . . , T is the martingale test itself: 



Srj 



exp 



(a2 - (a*™°)2)ro + 2(a* 



a)ri 



(3) 



where 



and 



t=i 



vt-m- 



To get rid of the parameter a*'^^'', let us integrate ([3]) over the probability dis- 
tribution N{a,a^) on the a*''"°s: 



ca 



27ra 



exp 



{a' - (a*™'')^)ro + 2(a' 



a)ri 



X exp 



[a - a*™^)^ 



2a2 



da' 



\/2TTa J - 



exp 



^ + ) a;^ + (Ti - aro)^ ) dec 



(where I made the substitution x — a*''"° — a). Now the formula 



f 

J — ( 



B 

exp(— Ax^ + Bx) dx — — exp ( ^-^ 



gives 



07-I 



1 



exp 



2 a2ro 



1 



(4) 



To find the confidence intervals corresponding to (|4|), fix a confidence level 
1 — 5. The (1 — (5)-confidence interval corresponding to (jl]) is defined as the set 
of as satisfying 



1 



Va^ro + 1 



exp 



2 a2ro + 1 



1 

< -. 

- ,5 



Solving this in a gives the confidence interval 



< 



a^To + l, a2ro + l 
In ■ 



(52 



(5) 



Notice that, in the stationary case |a| < 1, where Fq has the order of magni- 
tude T, the size of the confidence interval (O is 0{^/\nT/T) as T — > 00. This is 
worse that the usual iterated-logarithm behaviour (0(-\/ln In T/T)) but agrees 
with [4], Theorem 2.5 (although the latter result is just an upper bound). One 



3 



can speculate that, in the stationary case, the 0(\/ln In T/T) behaviour will be 
recovered if the N{a,a^) is replaced by a probability distribution that is more 
concentrated around a, as in Ville's [7] proof of the law of the iterated logarithm 
(see also Chapter 5). 

Most of the terms in the confidence interval ([5]) are familiar from the litera- 
ture (which, however, mainly covers the case of weak confidence intervals). The 
centre of the interval is just the least-squares estimate of a from the given 
sample. The statistic 

(for a fixed sample size T) has been studied extensively. In describing the 
known results I will follow [2]. Mann and Wald [3] showed that tt is A^(0, 1) 
asymptotically when |a| < 1. Anderson [T] extended this to the case \a\ > 
1. White [8] and Rao [5^ showed that, in the case |a| = 1, tt converges in 
distribution to 

y/>2(.)ds 

where is a standard Brownian motion. 

Suppose, for concreteness, that ([6]) is asymptotically A^(0, 1). The central 
asymptotic weak confidence interval for a based on the statistic given after the 
"Ri" in (O will be different from ^ in that 

^In ^'y ^ = y21ni+ln(a2ro + I) (8) 
will be replaced by the upper (5/2-quantile of A^(0, 1), essentially by 




for a small S. This is close to the first addend on the right-hand side of and 
so the second addend represents the price that we are paying for our confidence 
intervals being strong. 



3 Empirical results 

To test the test martingales Q empirically, I generated yo, - ■ ■ ,yiooo from the 
model ([T]) with j/q = and a = 0.8, 1. The case a = 0.8 illustrates the sta- 
tionary behaviour (|a| < 1), and the "unit-root" case a = 1 is intermediate 
between the stationary and "explosive" (|a| > 1) behaviour. Tables [T] and [D 
give the approximate weak central 99%-confidence intervals based on the above 
approximations for tt (normal for a — 0.8 and (O for a = I) and the strong 
99%-confidence intervals computed from 
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Type of the interval 


Confidence interval 


Its width 


Weak (approximate) 
Strong 


[0.736,0.837] 
[0.716,0.857] 


0.101 
0.141 



Table 1: Weak and strong 99%-confidence intervals obtained for T = 1000 and 
a = 0.8 (stationary case). The value of the constant a is 0.1. 



Type of the interval 


Confidence interval 


Its width 


Weak (approximate) 
Strong 


[0.982, 1.003] 
[0.977, 1.010] 


0.022 
0.033 



Table 2: The analogue of Table [T] for a = 1 (unit root case). 



The intuition behind the value of a in ([5]) is that it should be of the same 
order of magnitude as the expected width of the confidence interval (since a 
represents the order of magnitude of the distance to the bulk of a*'"" that we 
are competing with). It is taken as 0.1 in the tables, but the results will not be 
drastically different if a = 1, which is intuitively more "neutral", is chosen: e.g., 
the width 0.141 in Table [T] would go up to 0.162, and the width 0.033 in Table 
m would go up to 0.037. 

Figures [1] and [5] give the final values for the same data set and the same 
value of a, a = 0.1. 

4 Directions of further research 

These are some possible areas in which the methods of martingale testing could 
be applied: 

Online testing of statistical models. When the strong confidence interval 
[^t,Mt] becomes empty, the statistical model can be rejected. Of course, 
efficient testing of statistical models will require different martingale tests: 
it will not be sufficient to consider, as in this note, different values of 
parameters as alternatives. 

Prediction. In the simplest case, the prediction interval at step t might be 
computed as the union of the prediction intervals corresponding to all 

a g [lt,ut]. 

Alternative assumptions about innovations. For example, the assump- 
tion that Ct have zero medians (conditional on the past) might lead to 
feasible statistical procedures. 
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Figure 1: The capital S'fQoo achieved for various values of a when the true 
coefficient a is 0.8. 
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Figure 2: The analogue of Figure [T] for a — 1. 
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